The effects of sentence length on dependency distance, dependency direction and the implications-Based on a parallel English-Chinese dependency treebank

نویسندگان

  • Jingyang Jiang
  • Haitao Liu
چکیده

Dependency distance is closely related to human working memory capacity, but is also influenced by other non-cognitive factors. Studies of dependency distance contribute to the understanding of the universalities and peculiarities of languages as well as human cognitive processes in language. Forty two sentence sets were selected from a parallel English–Chinese dependency treebank to examine the progressive properties of dependency distance with the change of sentence length in the two languages. It was found that: (1) the probability distribution models of dependency distance of both languages are not affected by either sentence length or the type of language; (2) the quantity of adjacent dependencies in the two languages are identical, but the quantity of adjacent dependencies of Chinese fluctuates within a limited range, while that of English shows a falling tendency; (3) the mean dependency distances (MDDs) of Chinese are always higher than those of English, and both MDDs show slight ascending trends; (4) compared with dependency distance, dependency direction is a more reliable metric for language classification. These findings suggest that: (1) the universal cognition mechanism may be the major factor affecting the general traits of dependency distance, while language-related factors such as sentence length may affect certain traits of dependency distance; and (2) Chinese taxes working memory more than English. 2015 Elsevier Ltd. All rights reserved.

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Dependency Relations and Dependency Distance - a statistical view based on Treebank

The dependency relation is the most essential ingredient in a dependency-based theory of syntax. This paper presents some statistical findings on the dependency relation extracted from a Chinese dependency treebank. A sentence in the proposed treebank can easily be converted into a SSyntS graph in Meaning-Text Theory. The statistics on the dependency relation show that modifiers make up 55% of ...

متن کامل

تبدیل خودکار درخت‌بانک وابستگی فارسی به درخت‌بانک سازه‌ای

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a depe...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015